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ABSTRACT 

Multiple matrix sampling (MMS) procedures were 
utilized to determine the necessary parameters of a Pearson Type I 
curve^ Empirical norms distributions were approximated by both the 
Type I model and the negative hypergebmetric model. Four existing 
ITED norms distributions, two subtests and two grades, were 
approximated by the MMS procedures. Two sampling designs for each 
test-grade combination were studied. Comparison of approximations 
obtained for the Type I curve and the negative hypergeometric curve 
supported the use of the Type I curve for determining test score 
distributions of large populations. (Author) 
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INTRODUCTION 



CATION POSITION OR POLICY 



Lord (1962) and others (Kleinke, 1972; Flumlee, 1964; Shoemkker, 
1970) have contended that more representative samples of students could 
be obtained for the national norms of standardized achievement tests 
if lass examinee time were requested* Tliey proposed an item sampling 
plan or a multiple matrix sampling (MMS) plan to reduce the amount of 
time needed per examinee* To use matrix sampling for this purpose , it is 
sssumed that normative distributions conform closely to one or another 
theoretical probability distribution* The normative testing enables 
the testing agency to estimate the parameters of the assumed theoretical 
probability model* Then, an estimate of the entire norms distribution 
(i*e*9 the distribution obtained when all examinees take all items) is 
derived from this theoretical model* 

Past research in this area has primarily involved the use of the 
negative hypergeometric distribution as the model for the estimated 
norms distribution (Lord,. 1962; Shoemaker, 1970)* Recently i however, 
Brandenburg and Forsyth (1973) have found that the empirical norms dis- 
tributions of certain types of standardized tests can be approximated 
more adequately by using a Pearson Type I curve (Pearson and Johnson, 
1968) rather than the negative hypergeometric model* The Brandenburg 
and Forsyth (1973) study was not an MMS or item sampling study* They 



utilized the entire norma distribution for each test to compute the 
necessary parameters of both the negative hypergeometric (first two 
moments required) and the Pearson Type I model (first four itoaenta re- 
quired) < 

When the two theoretical cumulative distributions were reproduced 

using the moments of the empirical data, the Type I model- was-found^-to 

fit the observed data more closely. Since the Pearson Type I model re- 
quires estiiuites of the first four moments of the norms distribution and 
aince these higher momenta of a distribution are known to have a high 
degree of sampling error. It seemed reasonable to investigate the superior- 
ity of the Pearson Type I model under MtfS conditions. The primary purpose 
of this study waa to compare the adequacy of these two probability models 
for approximating entire norma distributions when MMS procedures were ut- 
ilized to estimate the distribution parameters. 

FROCEDUAES 

Multiple Hatrtx Sampling 

A multiple matrix sampling experiment consists of administering samples 
of itians (i.e., eubtests) from a pool of items (or a test) to samples of 
examinees from some well-detined population. The random samples of examitiees 
are given either completely non-overlapping sets of items (I.e., the items 
are sampled without replacement) or potentially overlapping sets of items 
(i.e., the itame ece sampled without replacement for each subtest but with 
replacement between subteiits). 

The basic purpose of MMS is to make inferences about the scores of 
the population of examineea on the population of items, For example, in 
curriculum evaluatlona, the ^valuator may be interested in the Mtimatlng 
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of the mean acora for the population on a number of different cri- 
terion oeaoures. Rather than give the entire set of inatruraente to 
•11 students, he may use an MMS approach to estimate the meana. 

In order to make inferences about the total population of Items 
_and_examlnee8, It ie necessary^ to extrapolate the information In each 
matrix -eample. Thus, tor example, if ItTthl curriculum evaluation 
project mentioned above, ten samples (subtests) of items were given 
to ten samples of students, the Information from each sample is utilized 
in^some way to provide an estimate of the mean of all examinees on all 
items. For a more detailed discussion of the mechanics of these opera- 
tions, the reader Is referred to Shoemaker (1971a) and Knapp (1972). 
Theoretical Models Utilized 

This study was primarily concerned with the use of the MMS concept 
to estimate the parameters of two theoretical probability models: Pearson 
Type I and negative hypergeometrlc. With the numerical value of its 
parameters, thus specified, each model would then serve ae an approximation 
for the entire norms distribution. 

The Pearson Type I model requires the estimation of four parameters 
(mean, variance, a skewnese index and a kurtoeis index). The estimation 
of these quantities was accomplished primarily through the use of »^rd's 
(1960) formulae for estimating thr. momenta of a K-item test from the mo- 
menta of a k-item teat (K > k). 

The first around zero moment, (yp and the next three central moments 
(Mj* V^) were further adjusted to obtain unbiased estimates of the 
population parameters whan examinee sampling Is also assumed. These for- 
aulas are shown on the following pagej 
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Mean ; y£ . y£ (1) 

A 

A Da 

Variance : ii« « f2^ 

Third Central Moment : jo^ ^ 

(Cramer. 1946): ^3 - ^3 (3) 



Fourth Central Moment C n(n^ - 2n + 3) 

(Cra«<r, 1946): - (n ^ l)fa ^ 2)(n ^ 3) ^ 

3n (2n - 3) . . 



(n - l)(n - 2)(tt - 3) ^^2 

To specify a particular Pearson Type I curve, t)ie four noaents must be 
converted to coefficients of skewness and kurtosls. For a single matrix 
sample 9 the coefficients of skewness and kurtosls could be estimated as 
follows (population coefficients are given also): 

Population Matrix Sample 

A A A 

Skewness: /J^ - v^h^^^^ (5) /T~ - i^/i^ <6) 

A A 

Kurtosls: " ^4/^2 ^2 " ^^^2 ^ 

However 9 when several sets of matrix sample data are available 
for estimating a given parameter» there exists at least two ways of 
combining this data. The moments estimated from each matrix sample 
could be combined to estimate the population second » third and fourth 
moments* From theee estimates » skewness and kurtot^ls Indices could 
be computed (average moment method). Or alternatlvely» estimates of 
the skewness and kurtosls Indices could be obtained from the moments of 
each matrix sample and combined to yield overall estimates of the Indices 
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(average ratio method). Froc. zhe data cbteiiied la the VblS experimcsnt^, 
both methods app^-sr^-d i:c yield equi?.iiy good esfJjiatae of the coefficients. 
VowevBVf Mhen the Tyre I anc. negative hypergeouetrlc cutvos vere coa- 
•tructed iroa the two netv, of coefficlf^nta, the "^average niomeiit" method 
gave better reeultc. ?rfle results report c-d in tnlo etucy vere those 
associated with the "averaet: vactuenfJ' nathotl- 
Data Sourceo 

The available score distributions for thie study were based on the 
Iowa Tests of Educational Developmen t (Lindcuist and Feldt» 1970). Thirty- 
six distributions of ecoreo (9 tes'uc at 4 gr^ides) obtained from the 1971 
Iowa high school testing progrsTi were available. Frcia ehls pool of datSf 
the scores frotn two grades > $ atxd 12 1 and two teats » Quantitative Thinking 
(Q) end Use of _ Sources of^ laformatlcn (SI), were chosen for study. These 
four distributione represented extreme^ in skewness and a sl:teable range 
In the kurtoeie index. !]Llie descriptive data related to these distributions 
are given in Table 1. 



(Iraert Table 1 about here) 

Item BX iC ^Etoiminae JSjfljapling Designs 

ThlB inv-neilpaticTi ^v^s a po^t lacrce^ or poar. hoc experiment. That 
isy the normtJ cUstribntinn^ vrecn know:*, ana ftcm th^B^ distribution items 
and exaaineea t^ere selected u&ivig ^OA^CTal liUS designs. 

Four restrictiona were placed cn nhe oampling dei?igiis. Firsts 
the iteciG and examin^ae wore sampled without replacement for all 
matrix s;impleo. l^ixi restriction was made to limit the scope and cost 
of the study a and in recognition of Shoemaker's results <1971a, 1971bf 



and 1972). Shoemaker has ahmm only smali differences between over- 
lapping and eoncverlapping item eanpllng. Secondly, it was arbitrarily 
decided that the huaber of items in each toatrix should be no more than 
1/4 of the total number of itemc. ihia meant, that the njmber of matrix 
samples would be at least four. Thirdly, within each design the number 
of items (k) and the number of exaniness (n) were constant for each 
matrix sample. Finally, the number of eK&minees for each matrix sample 
was set at 500. Althbugh moat MMS studies have not utilized ss^ile sizes 
that large, such a number of examinees per niatrlx sample did not seem 
unreasonable since the purpose was to approximate entire norms distributions. 

Given these restrictions, the two tests chosen for this study ex- 
hibit different problems in choosing item sampling designs. The Quan- 
titatlve test is co^osed of 36 items which is easily divisible into the 
following k by t (number of items by number of subtests) designs: 
4x9, 6x6, 9x4. Following Shoemaker's suggestion (personal conmmi- 
cation, 1972), the 4 x 9 plan for the Quantitative test was eliminated. 
Tha other two plans were Implemented for grade 9 and grade 12 populations. 

The 46-ltem Sources of Information test did not lend Itself to a 
similar variety of simple deeigns. Xf it is required met every item be 
placed in one or another matrix., a test of this length permits only two 
sampling plans: 2 x 23 and 23 x 2. Tha first of these hag too few items 
per matrix to make possible the necessary eotitaatefj of aoaar.ts, and 
the second has moro than the maximum allowable proportion of items 
per subtest. On the basis of published resuHts (Shoemaker, 1971b), it 
was concluded that the randon exclusion of one. or two Items from ell 
subtests would not seriously affect the accuracy of the estimates of 
the moments. Thus, two sampling plans were adopted: four subtests of 
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eleven Items each (11 % 4) and five subtests of nine items each (9x5). 
It should be noted that despite the exclusion of one or two items, the 
distribution moments that were estimated from the data were for a 46-ltem 
test. A summary of the designs for boch tests is presented in Table 2. 



(Insert Table 2 about here) 



Five replications of each saapling design were carried out In order 
to provide an indication of the variability of the moment estimates 
and the corresponding variability of the approximations to the norms 
distribution. These five replications of each of two sampling plans 
produced 10 approximations of each norms distribution. Since four norms 
distributions had been chosen for study, there was a total of 40 repli- 
cations of the MMS experiment. 
Evaluation of Approximation s 

After the necessary parameters were estimated from the MMS procedure, 
the resultant Pearson Type I and negative hypergeometric curves were 
compared to the empirical norms distributions. Four measures of the 
discrepancy between the theoretical and empirical distributions were 
calculated: a) the maximum absolute difference In the relative fre- 
quency for any score Interval; b) the mean absolute difference in re- 
lative frequency for all intervals; c) a chi-square type Index calculat- 
ed on reUtive frequencies (Lord's D index, 1962); and d) the maximum 
absolute difference in the felative cumulative frequracy (rcf). For the 
purposes of this study, the last of these indexes (referred to as MDC) 
considered to be the most accurate representation of the results. 
As a consequence, it la the only index discussed here. Results for the 
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other indices are quite similar, and they ciay be found in Brandenburg 
(1972). 

The index MDC i.e defined as follows: 

MDC ■ Mexlmuic overall empirical score points of 
jrcf at X. for empirical distributlori - rcf 

at for theoretical distributionj . 
Thu8, MDC representfi sihe uaxinan ordinate di3crep»ne«-bctti2£a.jth.e..theore-. 
tical and empirical ogives. Except for the location of the decimal point, 
MDC equals the nuutlanim PR difference overall ecore points between the 
theoretical curve and the "true" curve. 

RESULTS AND DISCUSSION 
The MDC indices obtained from the five replications for each 
test-grade-saiqiling design combination and each theoretical curve 
are presented in Table 3. In parentheses preceding the data for each 
set of ten replications is the MDC index obtained when the nonents of 
entire norms distribution were used to define the theoretical ogive ^ 
and this model was compared to the empirical oglve> Each value is 
designated as an "original MDC." 

(Insert Table 3 about here) 

Two observations about these original MDC indices should be 
made before additional results are discijssed, wirst, each original 
Type I MDC index is lees than the corresponding negative hjpergeo- 
metrlc index. Second, the differences between the two original indices 
•re graatet fbr the Q-dlstributlons than for the S-dlstrlbutlons. 

Given the! above obaervations, it was nof. surprising to find that 
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In 19 of 20 replications related to the Q-test, the Typ'^ I liDC index 



13 of these 19 replications, the difi'eience ont\7een the observed 
T]rpe I index and observed negative hypargcometric was ^i^ller than 
the difference between the original indices* However* the median 
difference vras still approximately -0,027 for Q-9 ai)d --021 for q-12. 



ThttSt under the sampling design restrietiorns of xnis ^.mrestigation 
for the test 9 the results of the MMS experlmeut strongly support 
the utilisation of tVe Type X model for approximating norms distri-* 
butions rather thaa the negative hypergeometrlc model* 

The results related to the S*-test are not nearly as conclusive* 
For distribution S-9» the original difference in HDG indices was 
^•004108* Thu&» for all practical purposes^ both models were pro- 
viding similar approximations* In only 5 of the 10 replications for 
this test was th^ Type X model better* ^Given tho eize of the original 
difference t auch a result was not unexpected* However » it does pro- 
vide seme evidence thatt under the sampling conditions of this study » 
the estimation of four moments rather than two does not ^.ntroduce 
an excessive amount of error in the approximations* 

For distribution 8-12, 6 tha 10 replications yielded better 
MDC indices for the Type X model* Since ^ th^ original MDC difference 
(-** 105498) watt An favor cf the Type I models chia result was expected 
also* 

In Bvmmvy^ the empirical data of this ^tudy se^m to support 
the utllisatic^ of the Type I madel* Of cotirsoji generalizations 
beyond the restrictions o£ this Htv&/ art^ difficult to make* The 
present study concerned a particular type of achievement test data* 



was less than the corrsspondiitg negative hypt:rgi 



eometric ind^x* In 
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tfithout resort to item saapllng, it had been established that the norms 
distributions for these test:* were better approximated by a Type I model. 
Also, it is possible that the negative hypergeometric model (which re- 
quires estimates of only two parameters) may provide "good" approximations 
with smaller sample sizes and hence decrease tha cost o£ obtaining the HMS 
data. It is possible that other saaqiling designs nay produce different 
results: Pitully, tne distributions studied here were somewhat "atypical" 
in terms of the original MDC values. In a study of 90 empirical distri- 
butions (4 of which ere used here), Brandenburg and Forsyth (1973) found 
that the median MDC index (i.e., original HDC) for the Type I fit was 
.015, and the median MDC for the negative hypergeometric fit was .033. 
Thus, the four distributions examined in this study had above "average" 
MDC indices. Perhaps other distributions with smaller original MDC 
indices would not produce similar results. The effect of the above fac- 
tors must, of course, be examinad before any general conclusions regarding 
tha Type I model can be made. Nevertheless, the results of the present 
study do Indicate that future investigations into thi» use of HMS pro- 
cedures for the purpose of approximating norms distributions sbould in- 
clude the Type I model. 

ADDITIONAL COMMENTS 
General conclusions involving test differences or grade difference 
from Type I approxliaatlons are difficult to make from Tmble 3. There are, 
however, noticeable differences between dcaigns within each test-grade 
combination. For Q, the 9 x & design yielded better results than the 
6x6 design. Thie does not substantiate Shoemaker's (1971a) statement 
that all sampling designs with equal values of the product (t)(k)(n) give 
•sseotlally the SMie standard error for estimated paramMera. ShooMkar, 
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hovevery based his Inference on mean and variance estimation, whereas 
our MDC data Involved the estimation of four parameters. 

Shoemaker (personal comunication, 1972) indicated that a greater 
number of items per subtests may be used to better estimate the higher- 
order moments. This is true for MMS results for Q. However, it is 
not true for the MMS results for S; better results (lower MDC values) 
were achieved for the 9x5 design compared to the 11 x 4 design. But 
the interpretation of this reversal is confounded somewhat by factors 
affecting these results and not the Q results. Although the 11 x 4 design 
uses 2 more items per subtest ^ it also omits 2 items, whereas the 9x5 
design only omits 1 item. Furthermore, the 9x5 design has 500 more ob- 
servations per replication. 

Also, it may be observed from Table 3 that 7 of the AO MMS-d^rived 
Type I approximations and 18 of the 40 MMS-derlved negative hypergeometric 
approximations yielded MDC indices less than their respective original MDC 
indices obtained from approximations using the norms distribution moments* 
This means that the use of population moments does not guarantee a "best*- 
fitting" curve. In general, however, the original MDC index was for all 
practical purposes a lower bound for the obtained MDC indices from MMS, 
The Type I approximations of the four norms distributions from the 
MMS experiments had MDC indices about .015 larger than their corresponding 
original MDC indices. Thus, It might be hypothesised that even for relatively 
good original MDC indices (say, less than .015) the MDC indices from the MMS 
technique would be near .030. If this hypothesis is assumed to be true, and 
if it is also assumed that the possibility of good norms approximations are 
greatest when a post mortem type design is utilized, the MMS results may 
not be very encouraging. On the other hand, if biased populations are 
obtained via the traditional standardization procedures, then these results 
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ttay be viewed quite positively* 

Al80» it should be noted that the MDC indices were computed on 
the basis of the raw score norms distributions before any smoothing 
had been undertaken. Since most test publishers would smooth the 
obtained raw score distributions before assigning percentile ranks , 
it is possible that the norms distribution ma^ed from MHS tech- 
niques would approximate the smoothed distributions better than the un- 
smoothed distributions* 



Descriptive Data For the Four Tests 

Test & No. of 



Grade 


Itena 


N 


Hean(y^) 


Variance (y 2) 


Skewne88(/B^ 


Kurtosis (6^) 


Q-9 


36 


16,867 


12.143 


33.551 


0.8559 


3.4573 


Q-12 


36 


11,581 


17.820 


66.489 


0.3402 


2.1100 


SI-9 


46 


16,867 


22.304 


62.949 


0.2027 


2.2319 


SI-12 


46 


11,581 


29.058 


74.220 


-0.4918 


2.3925 
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Table 2 




MHS Sampling Designs 


(t/k/a) 


Test 


9 


Grade 

12 


Quantitative (Q) 


6/6/500* 
A/9/500 


6/6/500 
A/9/500 


Sources of 
Tnformation (SI) 


5/9/500 
A/11/500 


5/9/500 
A/11/500 



*The first number represents subtests « the second 
the number of Items on esch subtest « and the third 
the number of examinees taking each subtest* 
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TestHSrade 



Q-9 



Q-12 



Table 3 
MDC Values 



k X t 
Design 




MDC-Type I 
(.021232)* 


MDC-Neg. Hyp. 
(.047654) 


(TI-NH) 
DIFF 
(-.026422) 




h 


.029929 


.058785 


-.028850 




h 


.035248 


.042731 


-.007483 


6x6 


h 


.045532 


.062194 


-.016662 






.044109 


.034803 


•f. 009306 




h 


.028776 


.060987 


-.032209 




h 


.019368 


.053258 


-.033890 






.039317 


.067945 


-.028628 


9x4 


h 


.014036 


.043052 


-.029016 




h 


.028100 


.054013 


-.025913 


• 


h 


.033013 


.052195 


-.019182 






(.021541) 


(.050193) 


(-.028652) 




h 


.036305 


.046241 


-.009936 


6x6 


h 


.039912 
.030392 


.064887 
.058305 


-.024975 
-.027913 






.028969 


.057338 


-.028369 






.031347 


.048560 


-.017213 




h 


.020191 


.050880 


-.030689 




h 


.053794 


.056605 


-.002811 


9x4 




.027561 


.051635 


-.Q24074 






.033359 


.049260 


-.015901 




h 


.032023 


.047143 


-.015120 



*Hunber0 in parentheses are the "original" MDC values calculated vhen pop- 
ttlAtioA aonentt and tha given models were used to fit tht eaplrlcal norms* 
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Te3t-*Grade 



k X t 
Design 



11 X 4 



S-9 



S-12 



Table 3 (cont.) 
MDC Values 





MDC-Type I 
(.042927)* 


MDC-Neg. Hyp. 
(.04?035) 


(TI-NH) 
DIFF 
(-.004108) 


h 


.066313 


.068373 


-.002060 




.071241 


.055280 


+.015961 




.047555 


.053200 


-.005645 




.052957 


.040040 


+.012917 


»5 


.036337 


.032900 


+.003437 





*1 


.040228 


.046565 


-.006337 




^^2 


.046792 


.042747 


+.004045 


9x5 




.036583 


.041477 


-.004894 






.044829 


.043297 


+.001532 




h 


.040741 


.043557 


-.002816 






(.017199) 


(.032697) 


(-.015498) 




h 


.031203 


.047869 


-.016666 


LI X 4 




.043915 


.043399 


+.000516 


h 


.026028 


.029613 


-.003585 




h 


.034280 


.055394 


-.021114 






.034280 


.028068 


+.006212 




h 


.019118 


.034215 


-.015097 






.035228 


.034104 


+.001124 


9x5 




.050286 


.042598 


+.006688 




\ 


.017334 


.032830 


-.015496 




s 


.025612 


.030752 


-.005140 



*Nuiid>ere in parentheses are the "original" MDC values calculated when pop- 
ulation nonants and the given models were used to fit the empirical norms. 
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